Introduction to HPC
Snakemake Workflow Manager
Manuel Holtgrewe
Berlin Institute of Health at Charité
Session Overview
Aims
- Understand the need for reproducibility when using computational methods.
- Learn about the role of workflow managers.
- Learn to use the Snakemake workflow manager.
- Use Snakemake together with
Actions
- Installing and using Snakemake.
- Writing modular Snakemake workflows.
- Use Snakemake with Conda for software management and Slurm for execution.
Reproducibility in Computational Sciences
- Reproducibility in General
- Bioinformatics Reproducibility Issues
Reproducibility in General
- Definition
- Reproducibility Crisis
- Reproducibility vs. Generalizability
Issue: Software (1)
Your code with Git
Issue: Software (2)
Other code
- SBOM
- conda
- apptainer images
Issue: Parameters
- Keep a record
- Random numbers
Issue: Data
- Keep it save
- Keep integrity
- Keep rights
- … of researcher
- … of individual (if human)
Workflow Managers
- Introduction
- Snakemake
- Nextflow
- Galaxy
Introduction
- Definition / what it does
- Tasks:
- orchestrate jobs
- keep logs
- continuability
Snakemake
- Python-based
- Similar to Unix Make
- “Back to front” with dependencies
- Explicit names, visible to the user
Nextflow
- DSL based on groovy
- data “hidden” from the user
- front-to-back based on pipeline
Galaxy
- graphical
- central server
- backing cluster
- not at BIH
The Snakemake Workflow Manager
- Introduction
- Installation
- Our First Workflow
- A Real Workflow
Our First Workflow (3)
- read parameters from sample sheet
A Real Workflow (1)
the first version
A Real Workflow (2)
resource specification
A Real Workflow (3)
config files
A Real Workflow (4)
resource specification
A Real Workflow (5)
temporary files
A Real Workflow (6)
logging
A Real Workflow (7)
- wildcard constraints
- temporary files
Using the Slurm Runner (1)
how does the integration work?
Using the Slurm Runner (2)
running it
Using the Slurm Runner (3)
looking at logs etc
Tricks (3)
- rerun with increasing resources
Bring Your Own Project
🫵 Where can you apply what you have learned in your PhD project?
This is not the end…
… but all for this session
Recap
- Reproducibility
- Challenges and resolution approaches
- The role of workflow managers
- Snakemake
- Writing modular
Snakefiles
- Using Snakemake with conda for reproducible software installations
- Using Snakemaek with Slurm for scaling up your workflows